Task 1

The data in this task contains information on detection of two types of mosquitoes, Aedes aegypti and Aedes albopictus in the world. A few of the variables that was collected in the data are location and detection time.

1.1

In this task the detection of both types of mosquitoes are presented for the years 2004 and 2013. Figure 1 presents the data collected year 2004 and figure 2 the data collected year 2013.

Figure 1: Detected mosquitoes of the types Aedes aegypti and Aedes albopictus in the world during year 2004

Figure 2: Detected mosquitoes of the types Aedes aegypti and Aedes albopictus in the world during year 2013

In figure 1, there are a few countries that detected high densities of mosquitos, these are: United States, Brazil, Venezuela and Taiwan.
In United Stated, aegypti mosquitoes were detected in the southern area and albopictus mosquitoes in the south-eastern area.
In Brazil the mosquitoes were detected in the east and south, a few of the type albopictus and the rest of the type aegypti.
In Venezuela mosquitoes were detected in the northern areas, all of the type aegypti.
In Taiwan mosquitoes were detected in the southern areas, all of the type albopictus.

In figure 2, there are a two countries that detected higher densities of mosquitoes compared to figure 1, these are: Brazil and Taiwan.
With a zoomed out picture of Brazil, it appears that the mosquitoes could be detected all over the country, but with a lower density in the north-western area. Only the type aegypti were detected.
With a zoomed out picture of Taiwan, it appears that only the central and eastern part of the country did not detect any mosquitoes, the rest of the country detected the type aegypti. A closer inspection reveals that mostly the south-western part of the country had a high density of mosquitoes and the rest of the country the density was lower.
For the rest of the world it appears that the number of detected mosquitoes has decreased.

In figure 2 a perception problem of overplotting was highlighted for Brazil and Taiwan. The detection of mosquitoes are represented as dots and the size of the dots on the screen remains the same size whether the figure is zoomed in or out. This particular is a problem when inspecting a zoomed out figure, where the dots overlap, leading to the illusion that almost the whole country have detected mosquitoes for the year 2013.

1.2

The number of detected mosquitoes per country during all study period have been calculated as \(Z\). Figure 3 presents the value of \(Z\) for all countries in the world.

Figure 3: Number of mosquitos detected in each country during all study period, the map have an equirectangular projection.

In figure 3 the three countries with the most detected cases were Taiwan with 24837 cases, Brazil with 8501 cases and United States with 2044 cases. All other countries have significant lower detected cases than Taiwan. Mapping frequencies of detected mosquitoes to colour will therefore lead to the colour of all other countries to be similar. The map contains little information since it appears that almost all countries have the same colour.

1.3

In this task the computed value \(log(Z)\) is used (\(Z\) were computed from task 1.2).

Figure 4 presents an equirectangular projection with mapping of colour to \(log(Z)\). Figure 5 presents a conic equal area projection with mapping of colour to \(log(Z)\).

Figure 4: Logarithmised number of mosquitos detected in each country during all study period, the map have an equirectangular projection.

Figure 5: Logarithmised number of mosquitos detected in each country during all study period, the map have an conic equal area projection.

In figure 4 it is easier to compare the frequencies of detected mosquitoes for different countries with each other compared to figure 3. It appears that most detection of mosquitoes are in the continents: North America, South America, Asia, and Oceania. It is however more difficult to estimate the absolute frequency for each country since the values have to be calculated back from log.

In figure 5 the projection of the map is conic equal area, this means that the area of each country in the map is representative of the size in the real world. However the projection presents the map in a different shape than most people are used to.
In figure 4 the projection is not equal area nor conformal, this means that areas and the shapes of countries in the map is not correct to the real world. The map is however thematic, which makes it easier to inspect compared to figure 5.

1.4

In this task Brazil has been discretised into 100x100 squares. In each square the number of observation have been counted, the mean value of longitude and latitude of the observations within each groups have been calculated. The result is presented in figure 6.

Figure 6: Detected mosquitoes in Brazil for the year 2013.

In figure 6 most of the density is along the east coast and to the south. Comparing figure 6 to figure 2 it is easier to compare smaller areas in the country, since the frequency is colour coded.

Assignment 2.

2.1

The data in this task is read and processed with the following code.

# Task 2.1
SCB_file <- read.csv("sweden_income.csv", fileEncoding = "iso-8859-1")
SWE_map <- fromJSON(file="gadm41_SWE_1.json")


colnames(SCB_file) <- c("region","age","income")
SCB_file[SCB_file$age == "18-29 years", 2] <- "Young"
SCB_file[SCB_file$age == "30-49 years", 2] <- "Adult"
SCB_file[SCB_file$age == "50-64 years", 2] <- "Senior"

SCB_file <- SCB_file %>%
  pivot_wider(
    names_from = age,
    values_from = income
  )

2.2

In figure 7 violin plots for the age groups “Young”, “Adult” and “Senior” are presented.

Figure 7: Violin plots showing mean income distributions per age group

In figure 7, it can be observed that the “Young” age group has the lowest average income compared to the three age groups. The median income is fairly similar between the “Adult” and “Senior” groups, but the variance is higher in the “Senior” age group.”Young” and “Adult” each have two outliers, whereas the “Senior” age group has just one outlier, all of which represent mean incomes significantly higher than the median mean income across all age groups. For the age groups “Adult” and “Senior,” the distribution is skewed to the right, while for the age group “Young,” the distribution more closely resembles a normal distribution.
The mode for all age groups are close to their median.

2.3

In figure 8 the dependence of “Senior” incomes on “Adult” and “Young” income are visualised.

Figure 8: Surface plot in Plotly showing dependence of Senior incomes on Adult and Young incomes in various counties

In figure 8, you can see that there is a positive trend between “Senior” and both “Young” and “Adult”, which means that if a region has a high relative income for “Young” it also has a high relative income for “Senior”. The same conclusion is drawn between “Adult” and “Senior”. This positive trend may indicate that regions with better economic conditions tend to benefit all age groups. We think that linear regression can be a suitable to model to predict mean income of “Senior” with “Young” and “Adult” as dependent variables, due to the positive correlation among the three age groups.

2.4

Figure 9 shows the distribution of mean incomes among the “Young” age group across different counties and figure 10 shows the distribution of mean incomes among the “Adult” age group.

Figure 9: Incomes of Young in different counties

Figure 10: Incomes of Adults in different counties

In figure 9 it is clear to see that the Stockholm region has the highest mean income among the regions. It also appears that regions in northern Sweden have on average a lower mean income than southern Sweden. The region with the lowest mean income is Västerbotten.

In figure 10, Stockholm also has the highest mean income, and it is also evident that northern Sweden has lower incomes than southern Sweden. The region with the lowest mean income is Gävleborg.

The choropleth maps gives new information in such way that you can see values for each individual county, this allows comparison between them and also identify the outliers. In figure 7 there were two outliers for the group “Young”, in figure 9 these two outliers can be identified as Stockholm and Halland. For the age group “Adult”, the same two outliers were identified in figure 10.
It is also possible to divide Sweden into different clusters

2.5

In figure 11 the location of Linköping has been added to the map from figure 10.

Figure 11: Incomes of Adults in different counties with Linköping marked with a red dot

Statement of Contribution

We worked on the tasks individually before the data labs, Duc on task 1 and William on task 2. We later solved the other task and compared our solutions.

Task 1

Most of the text written by Duc.

Task 2

Most of the text written by William.

Appendix

# Task 1.1 
df_mosq %>% 
  filter(YEAR==2004) %>%
  plot_mapbox() %>% 
  add_markers(type="scattermapbox",
              y=~Y, 
              x=~X,
              color=~VECTOR)


df_mosq %>% 
  filter(YEAR==2013) %>%
  plot_mapbox() %>% 
  add_markers(type="scattermapbox",
              y=~Y, 
              x=~X,
              color=~VECTOR) 


# Task 1.2
df_z <- as.data.frame(table(df_mosq$COUNTRY))
df_temp <- df_mosq %>% select(COUNTRY, COUNTRY_ID) %>% unique()
df_z <- merge(x=df_z, y=df_temp,
              by.x="Var1",
              by.y="COUNTRY")
colnames(df_z) <- c("country", "freq", "code")


# World map
g <- list(
  projection = list(type = "equirectangular")
)

plot_geo(df_z) %>%
  add_trace(
    z = ~freq, color= ~freq, color = "Blues", text = ~country, locations = ~code
  ) %>%
  layout(
    geo = g
  ) %>%
  layout(showlegend = FALSE) %>%
  colorbar(title = "Frequency")


# Task 1.3a
g <- list(
  projection = list(type = "equirectangular")
)
plot_geo(df_z) %>%
  add_trace(
    z = ~log(freq), color= ~freq, color = "Blues", text = ~country, locations = ~code
  ) %>%
  layout(
    geo = g
  ) %>%
  layout(showlegend = FALSE) %>%
  colorbar(title = "log(frequency)")


# Task 1.3b
g <- list(
  projection = list(type = "conic equal area")
)

plot_geo(df_z) %>%
  add_trace(
    z = ~log(freq), color= ~freq, color = "Blues", text = ~country, locations = ~code
  ) %>%
  layout(
    geo = g
  ) %>%
  layout(showlegend = FALSE) %>%
  colorbar(title = "log(frequency)")


# Task 1.4
df_brazil <- 
  df_mosq %>% 
  filter(YEAR==2013 & COUNTRY == "Brazil") 

df_brazil$X_int <- cut_interval(df_brazil$X, n=100)
df_brazil$Y_int <- cut_interval(df_brazil$Y, n=100)

data_plot <- 
  df_brazil %>%
  group_by(X_int, Y_int) %>% 
  summarise(mean_x = mean(X),
            mean_y = mean(Y),
            Count = n())

data_plot %>%
  plot_mapbox() %>% 
  add_markers(type="scattermapbox",
              y=~mean_y, 
              x=~mean_x,
              color=~Count) %>%
  layout(showlegend = FALSE) %>%
  colorbar(title = "Frequency")


# Task 2.1
SCB_file <- read.csv("sweden_income.csv", fileEncoding = "iso-8859-1")
SWE_map <- fromJSON(file="gadm41_SWE_1.json")


colnames(SCB_file) <- c("region","age","income")
SCB_file[SCB_file$age == "18-29 years", 2] <- "Young"
SCB_file[SCB_file$age == "30-49 years", 2] <- "Adult"
SCB_file[SCB_file$age == "50-64 years", 2] <- "Senior"

SCB_file <- SCB_file %>%
  pivot_wider(
    names_from = age,
    values_from = income
  )


# Task 2.2
plot <- plot_ly() %>%
  add_trace(data = SCB_file, x = ~"Young", y = ~Young,
            type = "violin", name = "Young", box=list(visible=TRUE)) %>%
  add_trace(data = SCB_file, x = ~"Adult", y = ~Adult, 
            type = "violin", name = "Adult", box=list(visible=TRUE)) %>%
  add_trace(data = SCB_file, x = ~"Senior", y = ~Senior,
            type = "violin", name = "Senior", box=list(visible=TRUE)) %>%
  layout(xaxis = list(title = "Age Group",
                      categoryorder = "array", categoryarray = c("Young", "Adult", "Senior")),
         yaxis = list(title = "Mean income"))
plot


# Task 2.3
s <- interp(SCB_file$Young, SCB_file$Adult, SCB_file$Senior, duplicate = "mean")
plot2 <- plot_ly(data = SCB_file, 
                 x=~s$x,
                 y=~s$y,
                 z=~s$z,
                 type = "surface",
                 colorbar = list(title = "Mean senior Income")) %>%
  layout(scene = list(
           xaxis = list(title = "Mean adult Income"),
           yaxis = list(title = "Mean young Income"),
           zaxis = list(title = "Mean senior Income"),
           showlegend = FALSE
         )) 
plot2


# Task 2.4
# removing county ID and "county" from string
SCB_file$region <- SCB_file$region %>%  str_sub(., 4, nchar(SCB_file$region) - 7)

SCB_file[SCB_file$region=="Västra Götaland",1] <- "VästraGötaland"
SCB_file[SCB_file$region=="Örebro",1] <- "Orebro"

# Young
g <- list(fitbounds="locations", visible=FALSE)
plot3 <- plot_geo(SCB_file) %>% 
      add_trace(type="choropleth",geojson=SWE_map, locations=~region,
               z=~Young, featureidkey="properties.NAME_1") %>% layout(geo=g) %>%
    layout(showlegend = FALSE) %>%
  colorbar(title = "Mean Young Income")
plot3

# Adult
plot4 <- plot_geo(SCB_file) %>% 
  add_trace(type="choropleth",geojson=SWE_map, locations=~region,
            z=~Adult, featureidkey="properties.NAME_1") %>% layout(geo=g) %>%
    layout(showlegend = FALSE)  %>%
  colorbar(title = "Mean Adult Income")

plot4


# Task 2.5
plot5 <- plot_geo(SCB_file) %>%
  add_trace(type="choropleth",geojson=SWE_map, locations=~region,
            z=~Adult, featureidkey="properties.NAME_1") %>% layout(geo=g) %>% 
  add_trace(y =58.41109, x=15.62565, color =I("red"), text = "Linköping", type="markers") %>%
    layout(showlegend = FALSE) %>%
  colorbar(title = "Mean Adult Income")
 
plot5